Understanding the Importance of Taxonomic Sampling for Large-scale Phylogenetic Analyses by Simulating Evolutionary Processes under Complex Models
نویسندگان
چکیده
Appropriate and extensive taxon sampling is one of the most important determinants of accurate phylogenetic estimation. In addition, accuracy of inferences about evolutionary processes obtained from phylogenetic analyses is improved significantly by thorough taxon sampling efforts. Much of the previous work examining the impact of taxon sampling on phylogenetic accuracy has focused on the effects of random taxon sampling or directed taxon addition/removal. Therefore, the effect of realistic, nonrandom taxon sampling strategies on the accuracy of large-scale phylogenetic reconstruction is not well understood. Typically, broad systematic studies of diverse clades select species according to current classification to span the diversity within the group of interest. I simulated phylogenies under a realistic model of cladogenesis and used these trees to generate sequence data. Using these simulations, I vi explored the effect of taxonomy-based taxon sampling on the accuracy of maximum likelihood reconstruction. The results demonstrate that taxonomy-based sampling has a stronger, negative, effect on phylogenetic accuracy than random taxon sampling. Therefore, it is recommended that systematists conducting phylogenetic analyses of diverse clades concentrate on improving sampling density within their group of interest by selecting multiple representatives from each taxonomic level. Phylogenetic tree imbalance is often used to make inferences about macroevolutionary processes that generate patterns of tree shape. However these patterns may be obscured by non-biological factors that can bias tree shape. Using published trees inferred from biological data and trees simulated under a realistic branching model; I investigated the affect of random taxon omission on phylogenetic tree imbalance. My results indicate that incomplete taxon sampling in the presence of variable rates of speciation and extinction may be sufficient to explain much of the imbalance observed in empirical phylogenies. Previous research has indicated that some methods of phylogenetic inference can produce biased tree topologies and shapes. Using simulated model tree topologies and sequence data, I investigated the non-biological factors that lead to biases in phylogenetic tree imbalance. Based on my results, I concluded that phylogenetic noise is the primary cause of tree shape bias. Methods that account for unobserved substitutions, such as maximum likelihood, can overcome the systematic bias toward imbalanced topologies. Figure 2.1: The relationship between weighted mean imbalance and node size for four sets of trees simulated under a range of variance parameters ... Figure 3.2: The nodal imbalance for the combined collection of empirical trees and the collection of trees simulated under varying rates of speciation and
منابع مشابه
Quantitative Comparison of Tree Pairs Resulted from Gene and Protein Phylogenetic Trees for Sulfite Reductase Flavoprotein Alpha-Component and 5S rRNA and Taxonomic Trees in Selected Bacterial Species
Introduction: FAD is the cofactor of FAD-FR protein family. Sulfite reductase flavoprotein alpha-component is one of the main enzymes of this family. Based on applications of this enzyme in biotechnology and industry, it was chosen as the subject of evolutionary studies in 19 specific species. Method: Gene and protein sequences of sulfite reductase flavoprotein alpha-component, 5S rRNA sequence...
متن کاملQuantitative Comparison of Tree Pairs Resulted from Gene and Protein Phylogenetic Trees for Sulfite Reductase Flavoprotein Alpha-Component and 5S rRNA and Taxonomic Trees in Selected Bacterial Species
Introduction: FAD is the cofactor of FAD-FR protein family. Sulfite reductase flavoprotein alpha-component is one of the main enzymes of this family. Based on applications of this enzyme in biotechnology and industry, it was chosen as the subject of evolutionary studies in 19 specific species. Method: Gene and protein sequences of sulfite reductase flavoprotein alpha-component, 5S rRNA sequence...
متن کاملContribution to the molecular systematics of the genus Capoeta from the south Caspian Sea basin using mitochondrial cytochrome b sequences (Teleostei: Cyprinidae)
Traditionally, Capoeta populations from the southern Caspian Sea basin have been considered as Capoeta capoeta gracilis. Study on the phylogenetic relationship of Capoeta species using mitochondrial cytochrome b gene sequences show that Capoeta population from the southern Caspian Sea basin is distinct species and receive well support (posterior probability of 100%). Based on the tree topologie...
متن کاملThe subjective nature of Linnaean categories and its impact in evolutionary biology and biodiversity studies
Absolute (Linnaean) ranks are essential to rank-based nomenclature (RN), which has been used by the vast majority of systematists for the last 150 years. They are widely recognized as being subjective among taxonomists, but not necessarily in other fields. For this reason, phylogenetic nomenclature (PN) and other alternative nomenclatural systems have been developed. However, reluctance to acce...
متن کاملThe systematic component of phylogenetic error as a function of taxonomic sampling under parsimony.
The effect of taxonomic sampling on phylogenetic accuracy under parsimony is examined by simulating nucleotide sequence evolution. Random error is minimized by using very large numbers of simulated characters. This allows estimation of the consistency behavior of parsimony, even for trees with up to 100 taxa. Data were simulated on 8 distinct 100-taxon model trees and analyzed as stratified sub...
متن کامل